Financial Time Series Models

Theoretical Framework

Overview

Having established the relationship between NBA performance and betting markets, we now shift our focus to the financial behavior of sports betting stocks themselves. In this chapter, these firms are treated as an independent financial ecosystem shaped by sports events, regulatory changes, and broader market forces. Financial returns exhibit volatility clustering, where periods of high or low volatility tend to persist. Because ARIMA models assume constant variance, they are not well suited for data with this property. GARCH models address this limitation by explicitly capturing time-varying volatility. The key insight is that while returns may be unpredictable due to market efficiency, volatility often follows predictable patterns that are essential for risk management, options pricing, and portfolio optimization.


Code
library(tidyverse)
library(ggplot2)
library(forecast)
library(astsa)
library(xts)
library(tseries)
library(fpp2)
library(fma)
library(lubridate)
library(TSstudio)
library(quantmod)
library(tidyquant)
library(plotly)
library(imputeTS)
library(dplyr)
library(reshape2)
library(gridExtra)
library(zoo)
library(kableExtra)
library(vars)
library(patchwork)
library(moments)
library(fGarch)

theme_set(theme_minimal(base_size = 12))
Code
# Load all stock data
dkng <- read_csv("data/financial/DKNG_daily.csv", show_col_types = FALSE)
colnames(dkng) <- gsub("Adj Close", "Adj_Close", colnames(dkng))
dkng <- dkng %>%
    mutate(Date = as.Date(Date)) %>%
    arrange(Date)

penn <- read_csv("data/financial/PENN_daily.csv", show_col_types = FALSE)
colnames(penn) <- gsub("Adj Close", "Adj_Close", colnames(penn))
penn <- penn %>%
    mutate(Date = as.Date(Date)) %>%
    arrange(Date)

czr <- read_csv("data/financial/CZR_daily.csv", show_col_types = FALSE)
colnames(czr) <- gsub("Adj\\.Close", "Adj_Close", colnames(czr))

if ("Adj_Close" %in% colnames(czr)) {
    czr$Adj_Close <- as.numeric(czr$Adj_Close)
}
czr <- czr %>%
    mutate(Date = as.Date(Date)) %>%
    arrange(Date)

mgm <- read_csv("data/financial/MGM_daily.csv", show_col_types = FALSE)
colnames(mgm) <- gsub("Adj\\.Close", "Adj_Close", colnames(mgm))

if ("Adj_Close" %in% colnames(mgm)) {
    mgm$Adj_Close <- as.numeric(mgm$Adj_Close)
}
mgm <- mgm %>%
    mutate(Date = as.Date(Date)) %>%
    arrange(Date)

Calculate Returns

Financial time series models typically work with returns rather than prices. So, we calculate log returns (continuously compounded):

\[ r_t = \ln(P_t) - \ln(P_{t-1}) = \ln\left(\frac{P_t}{P_{t-1}}\right) \]

Code
# Calculate log returns for all stocks
dkng$LogReturn <- c(NA, diff(log(dkng$Adj_Close)))
dkng$Return_Pct <- dkng$LogReturn * 100
dkng$Ticker <- "DKNG"
dkng <- dkng %>% filter(!is.na(LogReturn))

penn$LogReturn <- c(NA, diff(log(penn$Adj_Close)))
penn$Return_Pct <- penn$LogReturn * 100
penn$Ticker <- "PENN"
penn <- penn %>% filter(!is.na(LogReturn))

czr$LogReturn <- c(NA, diff(log(czr$Adj_Close)))
czr$Return_Pct <- czr$LogReturn * 100
czr$Ticker <- "CZR"
czr <- czr %>% filter(!is.na(LogReturn))

mgm$LogReturn <- c(NA, diff(log(mgm$Adj_Close)))
mgm$Return_Pct <- mgm$LogReturn * 100
mgm$Ticker <- "MGM"
mgm <- mgm %>% filter(!is.na(LogReturn))

# Summary statistics for all stocks
stocks_list <- list(
    DKNG = dkng$Return_Pct,
    PENN = penn$Return_Pct,
    CZR = czr$Return_Pct,
    MGM = mgm$Return_Pct
)

cat("\n=== RETURNS SUMMARY STATISTICS ===\n\n")

=== RETURNS SUMMARY STATISTICS ===
Code
for (ticker in names(stocks_list)) {
    returns <- stocks_list[[ticker]]
    cat(ticker, ":\n")
    cat("  Mean:", round(mean(returns), 4), "%\n")
    cat("  Std Dev:", round(sd(returns), 4), "%\n")
    cat("  Min:", round(min(returns), 4), "%\n")
    cat("  Max:", round(max(returns), 4), "%\n")
    cat("  Skewness:", round(moments::skewness(returns), 4), "\n")
    cat("  Kurtosis:", round(moments::kurtosis(returns), 4), "\n\n")
}
DKNG :
  Mean: 0.0638 %
  Std Dev: 4.2354 %
  Min: -32.6061 %
  Max: 15.9306 %
  Skewness: -0.3807 
  Kurtosis: 7.8417 

PENN :
  Mean: -0.022 %
  Std Dev: 4.9443 %
  Min: -59.4142 %
  Max: 29.8592 %
  Skewness: -1.7046 
  Kurtosis: 30.3981 

CZR :
  Mean: -0.0718 %
  Std Dev: 4.5517 %
  Min: -47.0084 %
  Max: 36.5733 %
  Skewness: -0.9193 
  Kurtosis: 21.3669 

MGM :
  Mean: -0.0042 %
  Std Dev: 3.4071 %
  Min: -40.9684 %
  Max: 28.6042 %
  Skewness: -0.9679 
  Kurtosis: 26.9113 

Data Visualization & Stationarity

Code
p1 <- ggplot(dkng, aes(x = Date, y = Adj_Close)) +
    geom_line(color = "#006bb6", linewidth = 0.8) +
    labs(
        title = "DraftKings (DKNG) Stock Price",
        subtitle = "Daily Adjusted Close Price",
        x = "Date", y = "Price ($)"
    ) +
    theme_minimal()

p2 <- ggplot(dkng, aes(x = Date, y = Return_Pct)) +
    geom_line(color = "darkred", linewidth = 0.5, alpha = 0.7) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
    labs(
        title = "DraftKings (DKNG) Daily Returns",
        subtitle = "Log returns showing volatility clustering",
        x = "Date", y = "Return (%)"
    ) +
    theme_minimal()

p1 / p2

Observations:

  • Price Series: Non-stationary with clear trends. DKNG went public via SPAC merger in 2020, experienced a pandemic boom, then declined as competition intensified.
  • Returns Series: Appears stationary around zero mean, but exhibits clear volatility clustering. Periods of calm (2021-early 2022) alternate with turbulence (late 2022, 2024).
Code
# Calculate rolling volatility (20-day window)
dkng <- dkng %>%
    arrange(Date) %>%
    mutate(
        Roll_Vol_20 = zoo::rollapply(Return_Pct, width = 20, FUN = sd, fill = NA, align = "right"),
        Abs_Return = abs(Return_Pct)
    )

p3 <- ggplot(dkng, aes(x = Date, y = Abs_Return)) +
    geom_line(color = "purple", alpha = 0.6) +
    labs(
        title = "Absolute Returns (Proxy for Volatility)",
        x = "Date", y = "|Return| (%)"
    ) +
    theme_minimal()

p4 <- ggplot(dkng, aes(x = Date, y = Roll_Vol_20)) +
    geom_line(color = "darkgreen", linewidth = 0.8) +
    labs(
        title = "20-Day Rolling Volatility",
        x = "Date", y = "Volatility (% std dev)"
    ) +
    theme_minimal()

p3 / p4

Volatility Clustering Evidence:

The rolling volatility plot clearly shows time-varying variance. Volatility is not constant it rises sharply during market stress and falls during stable periods. This visual evidence motivates GARCH modeling.

Code
# ACF/PACF of returns
par(mfrow = c(1, 2))
acf(dkng$Return_Pct, lag.max = 30, main = "ACF: DKNG Returns")
pacf(dkng$Return_Pct, lag.max = 30, main = "PACF: DKNG Returns")

Code
par(mfrow = c(1, 1))
Code
# ACF/PACF of squared returns (test for ARCH effects)
par(mfrow = c(1, 2))
acf(dkng$Return_Pct^2, lag.max = 30, main = "ACF: Squared Returns (DKNG)")
pacf(dkng$Return_Pct^2, lag.max = 30, main = "PACF: Squared Returns (DKNG)")

Code
par(mfrow = c(1, 1))

Key Findings:

  • Returns ACF/PACF: Minimal autocorrelation at most lags (consistent with weak-form market efficiency). Returns themselves are largely unpredictable.
  • Squared Returns ACF/PACF: Strong autocorrelation at multiple lags. This is the signature of ARCH effects while returns are unpredictable, volatility is highly predictable and persistent.
Code
# ADF test on returns
adf_dkng <- adf.test(dkng$Return_Pct)

cat("=== Augmented Dickey-Fuller Test ===\n")
=== Augmented Dickey-Fuller Test ===
Code
cat("Null Hypothesis: Series has a unit root (non-stationary)\n\n")
Null Hypothesis: Series has a unit root (non-stationary)
Code
cat("ADF Statistic:", round(adf_dkng$statistic, 4), "\n")
ADF Statistic: -10.1901 
Code
cat("p-value:", round(adf_dkng$p.value, 4), "\n")
p-value: 0.01 
Code
cat("Conclusion:", ifelse(adf_dkng$p.value < 0.05,
    "Reject H0 - Returns are STATIONARY",
    "Fail to reject H0 - Returns are NON-STATIONARY"
), "\n")
Conclusion: Reject H0 - Returns are STATIONARY 

Stationarity Conclusion: The ADF test confirms that DKNG returns are stationary (p < 0.01). This validates our use of returns rather than prices and confirms that we don’t need differencing in the mean equation.


Code
p5 <- ggplot(penn, aes(x = Date, y = Adj_Close)) +
    geom_line(color = "#006bb6", linewidth = 0.8) +
    labs(
        title = "Penn Entertainment (PENN) Stock Price",
        subtitle = "Daily Adjusted Close Price",
        x = "Date", y = "Price ($)"
    ) +
    theme_minimal()

p6 <- ggplot(penn, aes(x = Date, y = Return_Pct)) +
    geom_line(color = "darkred", linewidth = 0.5, alpha = 0.7) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
    labs(
        title = "Penn Entertainment (PENN) Daily Returns",
        subtitle = "Log returns showing volatility clustering",
        x = "Date", y = "Return (%)"
    ) +
    theme_minimal()

p5 / p6

Observations:

  • Price Series: Even more volatile than DKNG. PENN’s business model shifted from Barstool Sports (2020-2023) to ESPN BET (2023-present), creating structural breaks and extreme volatility.
  • Returns Series: Strong volatility clustering with several extreme spikes. This suggests PENN may exhibit even stronger GARCH effects than DKNG.
Code
penn <- penn %>%
    arrange(Date) %>%
    mutate(
        Roll_Vol_20 = zoo::rollapply(Return_Pct, width = 20, FUN = sd, fill = NA, align = "right"),
        Abs_Return = abs(Return_Pct)
    )

p7 <- ggplot(penn, aes(x = Date, y = Abs_Return)) +
    geom_line(color = "purple", alpha = 0.6) +
    labs(
        title = "Absolute Returns (Proxy for Volatility)",
        x = "Date", y = "|Return| (%)"
    ) +
    theme_minimal()

p8 <- ggplot(penn, aes(x = Date, y = Roll_Vol_20)) +
    geom_line(color = "darkgreen", linewidth = 0.8) +
    labs(
        title = "20-Day Rolling Volatility",
        x = "Date", y = "Volatility (% std dev)"
    ) +
    theme_minimal()

p7 / p8

Code
par(mfrow = c(1, 2))
acf(penn$Return_Pct, lag.max = 30, main = "ACF: PENN Returns")
pacf(penn$Return_Pct, lag.max = 30, main = "PACF: PENN Returns")

Code
par(mfrow = c(1, 1))
Code
par(mfrow = c(1, 2))
acf(penn$Return_Pct^2, lag.max = 30, main = "ACF: Squared Returns (PENN)")
pacf(penn$Return_Pct^2, lag.max = 30, main = "PACF: Squared Returns (PENN)")

Code
par(mfrow = c(1, 1))

Key Findings:

Similar to DKNG: returns show weak autocorrelation, but squared returns exhibit strong autocorrelation, indicating ARCH effects. The PACF of squared returns shows significant spikes at early lags, suggesting GARCH(1,1) may be appropriate.

Code
adf_penn <- adf.test(penn$Return_Pct)

cat("=== Augmented Dickey-Fuller Test ===\n")
=== Augmented Dickey-Fuller Test ===
Code
cat("ADF Statistic:", round(adf_penn$statistic, 4), "\n")
ADF Statistic: -10.6554 
Code
cat("p-value:", round(adf_penn$p.value, 4), "\n")
p-value: 0.01 
Code
cat("Conclusion:", ifelse(adf_penn$p.value < 0.05,
    "Reject H0 - Returns are STATIONARY",
    "Fail to reject H0 - Returns are NON-STATIONARY"
), "\n")
Conclusion: Reject H0 - Returns are STATIONARY 
Code
p_czr1 <- ggplot(czr, aes(x = Date, y = Adj_Close)) +
    geom_line(color = "#006bb6", linewidth = 0.8) +
    labs(
        title = "Caesars Entertainment (CZR) Stock Price",
        x = "Date", y = "Price ($)"
    ) +
    theme_minimal()

p_czr2 <- ggplot(czr, aes(x = Date, y = Return_Pct)) +
    geom_line(color = "darkred", linewidth = 0.5, alpha = 0.7) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
    labs(
        title = "Caesars Entertainment (CZR) Daily Returns",
        x = "Date", y = "Return (%)"
    ) +
    theme_minimal()

p_czr1 / p_czr2

Code
czr <- czr %>%
    arrange(Date) %>%
    mutate(Roll_Vol_20 = zoo::rollapply(Return_Pct, width = 20, FUN = sd, fill = NA, align = "right"))

p_czr_vol <- ggplot(czr, aes(x = Date, y = Roll_Vol_20)) +
    geom_line(color = "darkgreen", linewidth = 0.8) +
    labs(title = "CZR: 20-Day Rolling Volatility", y = "Volatility (%)") +
    theme_minimal()

print(p_czr_vol)

Code
par(mfrow = c(1, 2))
acf(czr$Return_Pct^2, lag.max = 30, main = "ACF: Squared Returns (CZR)")
pacf(czr$Return_Pct^2, lag.max = 30, main = "PACF: Squared Returns (CZR)")

Code
par(mfrow = c(1, 1))

Observations: CZR shows volatility clustering typical of casino/gaming stocks, with moderate ARCH effects visible in squared returns ACF.

Code
p_mgm1 <- ggplot(mgm, aes(x = Date, y = Adj_Close)) +
    geom_line(color = "#006bb6", linewidth = 0.8) +
    labs(
        title = "MGM Resorts (MGM) Stock Price",
        x = "Date", y = "Price ($)"
    ) +
    theme_minimal()

p_mgm2 <- ggplot(mgm, aes(x = Date, y = Return_Pct)) +
    geom_line(color = "darkred", linewidth = 0.5, alpha = 0.7) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
    labs(
        title = "MGM Resorts (MGM) Daily Returns",
        x = "Date", y = "Return (%)"
    ) +
    theme_minimal()

p_mgm1 / p_mgm2

Code
mgm <- mgm %>%
    arrange(Date) %>%
    mutate(Roll_Vol_20 = zoo::rollapply(Return_Pct, width = 20, FUN = sd, fill = NA, align = "right"))

p_mgm_vol <- ggplot(mgm, aes(x = Date, y = Roll_Vol_20)) +
    geom_line(color = "darkgreen", linewidth = 0.8) +
    labs(title = "MGM: 20-Day Rolling Volatility", y = "Volatility (%)") +
    theme_minimal()

print(p_mgm_vol)

Code
par(mfrow = c(1, 2))
acf(mgm$Return_Pct^2, lag.max = 30, main = "ACF: Squared Returns (MGM)")
pacf(mgm$Return_Pct^2, lag.max = 30, main = "PACF: Squared Returns (MGM)")

Code
par(mfrow = c(1, 1))

Observations: MGM exhibits similar volatility clustering patterns, characteristic of the gaming/hospitality sector with sports betting exposure.


Model Fitting

Code
# Prepare data
dkng_returns <- dkng$Return_Pct

# Fit candidate mean models (without GARCH first)
mean_models <- list(
    ARMA00 = arima(dkng_returns, order = c(0, 0, 0)),
    ARMA10 = arima(dkng_returns, order = c(1, 0, 0)),
    ARMA01 = arima(dkng_returns, order = c(0, 0, 1)),
    ARMA11 = arima(dkng_returns, order = c(1, 0, 1))
)

# Compare AIC
mean_aic <- data.frame(
    Model = names(mean_models),
    AIC = sapply(mean_models, AIC),
    BIC = sapply(mean_models, BIC)
)

cat("=== Mean Equation Comparison ===\n\n")
=== Mean Equation Comparison ===
Code
kable(mean_aic,
    format = "html",
    digits = 2,
    caption = "Model Comparison: Mean Equation Only"
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    ) %>%
    row_spec(which.min(mean_aic$AIC), bold = TRUE, color = "white", background = "#006bb6")
Model Comparison: Mean Equation Only
Model AIC BIC
ARMA00 ARMA00 6758.31 6768.46
ARMA10 ARMA10 6758.35 6773.57
ARMA01 ARMA01 6758.42 6773.64
ARMA11 ARMA11 6760.10 6780.39
Code
# Use best mean model from AIC
best_mean_idx <- which.min(mean_aic$AIC)
best_mean_model <- mean_models[[best_mean_idx]]
best_mean_name <- names(mean_models)[best_mean_idx]

cat("\nSelected Mean Model:", best_mean_name, "\n\n")

Selected Mean Model: ARMA00 
Code
# Extract residuals
resid_mean <- residuals(best_mean_model)

# Plot standardized residuals and squared residuals
par(mfrow = c(2, 2))
plot(resid_mean, type = "l", main = "Standardized Residuals", ylab = "Residuals")
abline(h = 0, col = "red", lty = 2)
plot(resid_mean^2, type = "l", main = "Squared Residuals", ylab = "Residuals")
acf(resid_mean^2, lag.max = 30, main = "ACF: Squared Residuals")
pacf(resid_mean^2, lag.max = 30, main = "PACF: Squared Residuals")

Code
par(mfrow = c(1, 1))

# ARCH LM test
arch_test <- FinTS::ArchTest(resid_mean, lags = 12)
cat("\n=== ARCH LM Test ===\n")

=== ARCH LM Test ===
Code
cat("Null Hypothesis: No ARCH effects\n")
Null Hypothesis: No ARCH effects
Code
cat("Chi-squared:", round(arch_test$statistic, 4), "\n")
Chi-squared: 45.9102 
Code
cat("p-value:", format(arch_test$p.value, scientific = TRUE, digits = 4), "\n")
p-value: 7.19e-06 
Code
cat("Conclusion:", ifelse(arch_test$p.value < 0.05,
    "REJECT H0 - ARCH effects present, GARCH modeling needed",
    "Fail to reject H0 - No ARCH effects"
), "\n")
Conclusion: REJECT H0 - ARCH effects present, GARCH modeling needed 

Interpretation:

  • Squared Residuals: Show clear autocorrelation even after fitting the mean equation
  • ARCH LM Test: Highly significant (p < 0.05), confirming ARCH effects
  • Conclusion: We need to add a GARCH variance equation to properly model volatility

Now we fit complete models with GARCH(1,1) variance equation:

Code
# Model specifications to test
model_specs <- list(
    list(formula = ~ garch(1, 1), name = "ARMA(0,0)+GARCH(1,1)"),
    list(formula = ~ arma(1, 0) + garch(1, 1), name = "ARMA(1,0)+GARCH(1,1)"),
    list(formula = ~ arma(0, 1) + garch(1, 1), name = "ARMA(0,1)+GARCH(1,1)"),
    list(formula = ~ arma(1, 1) + garch(1, 1), name = "ARMA(1,1)+GARCH(1,1)")
)

# Fit all models
garch_models <- list()
garch_results <- data.frame()

for (spec in model_specs) {
    # Fit model
    fit <- tryCatch(
        {
            garchFit(formula = spec$formula, data = dkng_returns, trace = FALSE)
        },
        error = function(e) {
            cat("Warning: Failed to fit", spec$name, "\n")
            NULL
        }
    )

    if (!is.null(fit)) {
        garch_models[[spec$name]] <- fit

        # Extract information criteria
        aic_val <- fit@fit$ics["AIC"]
        bic_val <- fit@fit$ics["BIC"]
        loglik_val <- fit@fit$llh

        garch_results <- rbind(garch_results, data.frame(
            Model = spec$name,
            AIC = aic_val,
            BIC = bic_val,
            Log_Likelihood = loglik_val
        ))
    }
}

cat("=== GARCH Model Comparison ===\n\n")
=== GARCH Model Comparison ===
Model Comparison: ARMA+GARCH(1,1) Models for DKNG
Model AIC BIC Log_Likelihood
ARMA(0,0)+GARCH(1,1) 5.5951 5.6123 3297.091
ARMA(1,0)+GARCH(1,1) 5.5899 5.6114 3293.044
ARMA(0,1)+GARCH(1,1) 5.5901 5.6116 3293.172
ARMA(1,1)+GARCH(1,1) 5.5914 5.6172 3292.924
Code
# Select best model
best_garch_idx <- which.min(garch_results$AIC)
best_garch_name <- garch_results$Model[best_garch_idx]
best_garch_model <- garch_models[[best_garch_name]]

cat("\n*** BEST MODEL:", best_garch_name, "***\n\n")

*** BEST MODEL: ARMA(1,0)+GARCH(1,1) ***
Code
cat("Model Coefficients:\n")
Model Coefficients:
Code
print(coef(best_garch_model))
        mu        ar1      omega     alpha1      beta1 
0.06268864 0.06170713 0.06162636 0.02794633 0.96813987 
Code
# Extract standardized residuals
std_resid <- residuals(best_garch_model, standardize = TRUE)

# Diagnostic plots
par(mfrow = c(3, 2))

# 1. Standardized residuals
plot(std_resid, type = "l", main = "Standardized Residuals", ylab = "Std. Residuals")
abline(h = 0, col = "red", lty = 2)
abline(h = c(-2, 2), col = "blue", lty = 2)

# 2. Squared standardized residuals
plot(std_resid^2, type = "l", main = "Squared Standardized Residuals", ylab = "Std. Residuals")

# 3. ACF of standardized residuals
acf(std_resid, lag.max = 20, main = "ACF: Standardized Residuals")

# 4. ACF of squared standardized residuals
acf(std_resid^2, lag.max = 20, main = "ACF: Squared Std. Residuals")

# 5. QQ plot
qqnorm(std_resid, main = "Q-Q Plot: Standardized Residuals")
qqline(std_resid, col = "red")

# 6. Histogram
hist(std_resid, breaks = 50, probability = TRUE, main = "Distribution of Std. Residuals", xlab = "Std. Residuals")
curve(dnorm(x), add = TRUE, col = "red", lwd = 2)

Code
par(mfrow = c(1, 1))
Code
# Ljung-Box test on standardized residuals
cat("\n=== Ljung-Box Test (Standardized Residuals) ===\n")

=== Ljung-Box Test (Standardized Residuals) ===
Code
cat("Null Hypothesis: Residuals are white noise (no autocorrelation)\n\n")
Null Hypothesis: Residuals are white noise (no autocorrelation)
Code
lb_test_resid <- Box.test(std_resid, lag = 10, type = "Ljung-Box", fitdf = 0)
cat("Lag 10:\n")
Lag 10:
Code
cat("  Q-statistic:", round(lb_test_resid$statistic, 4), "\n")
  Q-statistic: 6.2718 
Code
cat("  p-value:", round(lb_test_resid$p.value, 4), "\n")
  p-value: 0.7919 
Code
cat("  Conclusion:", ifelse(lb_test_resid$p.value > 0.05,
    "No autocorrelation detected (good)",
    "Autocorrelation detected (concern)"
), "\n\n")
  Conclusion: No autocorrelation detected (good) 
Code
# Ljung-Box test on squared standardized residuals
cat("=== Ljung-Box Test (Squared Standardized Residuals) ===\n")
=== Ljung-Box Test (Squared Standardized Residuals) ===
Code
cat("Null Hypothesis: No remaining ARCH effects\n\n")
Null Hypothesis: No remaining ARCH effects
Code
lb_test_sq <- Box.test(std_resid^2, lag = 10, type = "Ljung-Box", fitdf = 0)
cat("Lag 10:\n")
Lag 10:
Code
cat("  Q-statistic:", round(lb_test_sq$statistic, 4), "\n")
  Q-statistic: 2.7834 
Code
cat("  p-value:", round(lb_test_sq$p.value, 4), "\n")
  p-value: 0.9861 
Code
cat("  Conclusion:", ifelse(lb_test_sq$p.value > 0.05,
    "ARCH effects successfully captured (good)",
    "Remaining ARCH effects (may need higher order)"
), "\n")
  Conclusion: ARCH effects successfully captured (good) 

Diagnostic Interpretation:

  • ACF of Std. Residuals: No significant autocorrelation (validates mean equation)
  • Ljung-Box Tests: p-values > 0.05 indicate the model has adequately captured temporal dependencies
  • Q-Q Plot: Some deviation in tails suggests fat-tailed distribution (could consider t-distribution)
Code
# Extract conditional volatility
cond_vol <- best_garch_model@sigma.t

# Create data frame for plotting
vol_df <- data.frame(
    Date = dkng$Date,
    Return_Pct = dkng$Return_Pct,
    Cond_Vol = as.numeric(cond_vol)
)

# Plot returns with conditional volatility bands
ggplot(vol_df, aes(x = Date)) +
    geom_line(aes(y = Return_Pct), color = "black", alpha = 0.6, linewidth = 0.4) +
    geom_ribbon(aes(ymin = -2 * Cond_Vol, ymax = 2 * Cond_Vol),
        fill = "blue", alpha = 0.2
    ) +
    geom_line(aes(y = 2 * Cond_Vol), color = "blue", linetype = "dashed", linewidth = 0.6) +
    geom_line(aes(y = -2 * Cond_Vol), color = "blue", linetype = "dashed", linewidth = 0.6) +
    geom_hline(yintercept = 0, linetype = "solid", color = "red", linewidth = 0.5) +
    labs(
        title = "DKNG: Returns with GARCH Conditional Volatility (2 bands)",
        subtitle = "Blue bands show time-varying volatility forecast",
        x = "Date",
        y = "Return (%)"
    ) +
    theme_minimal()

Code
# Plot conditional volatility over time
ggplot(vol_df, aes(x = Date, y = Cond_Vol)) +
    geom_line(color = "darkgreen", linewidth = 0.8) +
    labs(
        title = "DKNG: GARCH Conditional Volatility Over Time",
        subtitle = "Volatility forecast from GARCH(1,1) model",
        x = "Date",
        y = "Conditional Volatility (%)"
    ) +
    theme_minimal()

Code
cat("\nCurrent Volatility (last observation):", round(tail(cond_vol, 1), 4), "%\n")

Current Volatility (last observation): 2.8037 %
Code
cat("Average Volatility:", round(mean(cond_vol), 4), "%\n")
Average Volatility: 4.0695 %
Code
cat("Max Volatility:", round(max(cond_vol), 4), "%\n")
Max Volatility: 7.4334 %

Code
penn_returns <- penn$Return_Pct

# Fit candidate mean models
mean_models_penn <- list(
    ARMA00 = arima(penn_returns, order = c(0, 0, 0)),
    ARMA10 = arima(penn_returns, order = c(1, 0, 0)),
    ARMA01 = arima(penn_returns, order = c(0, 0, 1)),
    ARMA11 = arima(penn_returns, order = c(1, 0, 1))
)

mean_aic_penn <- data.frame(
    Model = names(mean_models_penn),
    AIC = sapply(mean_models_penn, AIC),
    BIC = sapply(mean_models_penn, BIC)
)

cat("=== PENN Mean Equation Comparison ===\n\n")
=== PENN Mean Equation Comparison ===
Code
kable(mean_aic_penn,
    format = "html",
    digits = 2,
    caption = "Model Comparison: Mean Equation Only (PENN)"
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    ) %>%
    row_spec(which.min(mean_aic_penn$AIC), bold = TRUE, color = "white", background = "#006bb6")
Model Comparison: Mean Equation Only (PENN)
Model AIC BIC
ARMA00 ARMA00 7588.17 7598.44
ARMA10 ARMA10 7578.96 7594.37
ARMA01 ARMA01 7581.16 7596.57
ARMA11 ARMA11 7575.91 7596.46
Code
# ARCH test
best_mean_penn <- mean_models_penn[[which.min(mean_aic_penn$AIC)]]
resid_penn <- residuals(best_mean_penn)

arch_test_penn <- FinTS::ArchTest(resid_penn, lags = 12)
cat("\n=== ARCH LM Test (PENN) ===\n")

=== ARCH LM Test (PENN) ===
Code
cat("p-value:", format(arch_test_penn$p.value, scientific = TRUE, digits = 4), "\n")
p-value: 2.462e-99 
Code
cat("Conclusion:", ifelse(arch_test_penn$p.value < 0.05,
    "ARCH effects present - GARCH needed",
    "No ARCH effects"
), "\n")
Conclusion: ARCH effects present - GARCH needed 
Code
garch_models_penn <- list()
garch_results_penn <- data.frame()

for (spec in model_specs) {
    # Fit model using fGarch
    fit <- tryCatch(
        {
            garchFit(formula = spec$formula, data = penn_returns, trace = FALSE)
        },
        error = function(e) {
            cat("Warning: Failed to fit", spec$name, "for PENN\n")
            NULL
        }
    )

    if (!is.null(fit)) {
        garch_models_penn[[spec$name]] <- fit

        # Extract information criteria
        aic_val <- fit@fit$ics["AIC"]
        bic_val <- fit@fit$ics["BIC"]
        loglik_val <- fit@fit$llh

        garch_results_penn <- rbind(garch_results_penn, data.frame(
            Model = spec$name,
            AIC = aic_val,
            BIC = bic_val,
            Log_Likelihood = loglik_val
        ))
    }
}

cat("=== PENN GARCH Model Comparison ===\n\n")
=== PENN GARCH Model Comparison ===
Model Comparison: ARMA+GARCH(1,1) Models for PENN
Model AIC BIC Log_Likelihood
ARMA(0,0)+GARCH(1,1) 5.6132 5.6295 3523.881
ARMA(1,0)+GARCH(1,1) 5.6141 5.6345 3523.455
ARMA(0,1)+GARCH(1,1) 5.6141 5.6345 3523.469
ARMA(1,1)+GARCH(1,1) 5.6156 5.6402 3523.433
Code
best_garch_penn <- garch_models_penn[[which.min(garch_results_penn$AIC)]]
best_name_penn <- garch_results_penn$Model[which.min(garch_results_penn$AIC)]

cat("\n*** BEST MODEL:", best_name_penn, "***\n\n")

*** BEST MODEL: ARMA(0,0)+GARCH(1,1) ***
Code
cat("Model Coefficients:\n")
Model Coefficients:
Code
print(coef(best_garch_penn))
         mu       omega      alpha1       beta1 
-0.03411880  0.86283109  0.09135361  0.86147898 
Code
std_resid_penn <- residuals(best_garch_penn, standardize = TRUE)

par(mfrow = c(3, 2))
plot(std_resid_penn, type = "l", main = "Standardized Residuals (PENN)")
abline(h = 0, col = "red", lty = 2)
plot(std_resid_penn^2, type = "l", main = "Squared Std. Residuals (PENN)")
acf(std_resid_penn, lag.max = 20, main = "ACF: Std. Residuals")
acf(std_resid_penn^2, lag.max = 20, main = "ACF: Squared Std. Residuals")
qqnorm(std_resid_penn, main = "Q-Q Plot")
qqline(std_resid_penn, col = "red")
hist(std_resid_penn, breaks = 50, probability = TRUE, main = "Distribution")
curve(dnorm(x), add = TRUE, col = "red", lwd = 2)

Code
par(mfrow = c(1, 1))

# Ljung-Box tests
lb_penn <- Box.test(std_resid_penn, lag = 10, type = "Ljung-Box")
lb_penn_sq <- Box.test(std_resid_penn^2, lag = 10, type = "Ljung-Box")

cat("\nLjung-Box (Std. Residuals): p =", round(lb_penn$p.value, 4), "\n")

Ljung-Box (Std. Residuals): p = 0.438 
Code
cat("Ljung-Box (Squared): p =", round(lb_penn_sq$p.value, 4), "\n")
Ljung-Box (Squared): p = 0.8794 
Code
cond_vol_penn <- best_garch_penn@sigma.t

vol_df_penn <- data.frame(
    Date = penn$Date,
    Return_Pct = penn$Return_Pct,
    Cond_Vol = as.numeric(cond_vol_penn)
)

ggplot(vol_df_penn, aes(x = Date)) +
    geom_line(aes(y = Return_Pct), color = "black", alpha = 0.6, linewidth = 0.4) +
    geom_ribbon(aes(ymin = -2 * Cond_Vol, ymax = 2 * Cond_Vol),
        fill = "red", alpha = 0.2
    ) +
    geom_line(aes(y = 2 * Cond_Vol), color = "red", linetype = "dashed", linewidth = 0.6) +
    geom_line(aes(y = -2 * Cond_Vol), color = "red", linetype = "dashed", linewidth = 0.6) +
    geom_hline(yintercept = 0, linetype = "solid", color = "black", linewidth = 0.5) +
    labs(
        title = "PENN: Returns with GARCH Conditional Volatility",
        x = "Date", y = "Return (%)"
    ) +
    theme_minimal()

Code
ggplot(vol_df_penn, aes(x = Date, y = Cond_Vol)) +
    geom_line(color = "darkred", linewidth = 0.8) +
    labs(
        title = "PENN: GARCH Conditional Volatility",
        x = "Date", y = "Conditional Volatility (%)"
    ) +
    theme_minimal()

Code
czr_returns <- czr$Return_Pct

# Fit candidate mean models
mean_models_czr <- list(
    ARMA00 = arima(czr_returns, order = c(0, 0, 0)),
    ARMA10 = arima(czr_returns, order = c(1, 0, 0)),
    ARMA01 = arima(czr_returns, order = c(0, 0, 1)),
    ARMA11 = arima(czr_returns, order = c(1, 0, 1))
)

mean_aic_czr <- data.frame(
    Model = names(mean_models_czr),
    AIC = sapply(mean_models_czr, AIC),
    BIC = sapply(mean_models_czr, BIC)
)

cat("=== CZR Mean Equation Comparison ===\n\n")
=== CZR Mean Equation Comparison ===
Code
kable(mean_aic_czr,
    format = "html",
    digits = 2,
    caption = "Model Comparison: Mean Equation Only (CZR)"
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    ) %>%
    row_spec(which.min(mean_aic_czr$AIC), bold = TRUE, color = "white", background = "#006bb6")
Model Comparison: Mean Equation Only (CZR)
Model AIC BIC
ARMA00 ARMA00 8671.32 8681.92
ARMA10 ARMA10 8656.90 8672.79
ARMA01 ARMA01 8658.86 8674.76
ARMA11 ARMA11 8655.48 8676.67
Code
# ARCH test
best_mean_czr <- mean_models_czr[[which.min(mean_aic_czr$AIC)]]
resid_czr <- residuals(best_mean_czr)

cat("\n")
Code
# Plot squared residuals
par(mfrow = c(1, 2))
acf(resid_czr^2, lag.max = 30, main = "ACF: Squared Residuals (CZR)")
pacf(resid_czr^2, lag.max = 30, main = "PACF: Squared Residuals (CZR)")

Code
par(mfrow = c(1, 1))

arch_test_czr <- FinTS::ArchTest(resid_czr, lags = 12)
cat("\n=== ARCH LM Test (CZR) ===\n")

=== ARCH LM Test (CZR) ===
Code
cat("Chi-squared:", round(arch_test_czr$statistic, 4), "\n")
Chi-squared: 532.7903 
Code
cat("p-value:", format(arch_test_czr$p.value, scientific = TRUE, digits = 4), "\n")
p-value: 2.305e-106 
Code
cat("Conclusion:", ifelse(arch_test_czr$p.value < 0.05,
    "ARCH effects present → GARCH needed",
    "No ARCH effects"
), "\n")
Conclusion: ARCH effects present → GARCH needed 

Interpretation: The ACF/PACF of squared residuals show significant autocorrelation at multiple lags, and the ARCH LM test confirms the presence of ARCH effects. This validates the need for GARCH modeling to capture CZR’s time-varying volatility.

Code
garch_models_czr <- list()
garch_results_czr <- data.frame()

for (spec in model_specs) {
    # Fit model using fGarch
    fit <- tryCatch(
        {
            garchFit(formula = spec$formula, data = czr_returns, trace = FALSE)
        },
        error = function(e) {
            cat("Warning: Failed to fit", spec$name, "for CZR\n")
            NULL
        }
    )

    if (!is.null(fit)) {
        garch_models_czr[[spec$name]] <- fit

        # Extract information criteria
        aic_val <- fit@fit$ics["AIC"]
        bic_val <- fit@fit$ics["BIC"]
        loglik_val <- fit@fit$llh

        garch_results_czr <- rbind(garch_results_czr, data.frame(
            Model = spec$name,
            AIC = aic_val,
            BIC = bic_val,
            Log_Likelihood = loglik_val
        ))
    }
}

cat("=== CZR GARCH Model Comparison ===\n\n")
=== CZR GARCH Model Comparison ===
Model Comparison: ARMA+GARCH(1,1) Models for CZR
Model AIC BIC Log_Likelihood
ARMA(0,0)+GARCH(1,1) 5.3783 5.3926 3967.845
ARMA(1,0)+GARCH(1,1) 5.3784 5.3963 3966.944
ARMA(0,1)+GARCH(1,1) 5.3784 5.3963 3966.913
ARMA(1,1)+GARCH(1,1) 5.3796 5.4011 3966.849
Code
best_garch_czr <- garch_models_czr[[which.min(garch_results_czr$AIC)]]
best_name_czr <- garch_results_czr$Model[which.min(garch_results_czr$AIC)]

cat("\n*** BEST MODEL:", best_name_czr, "***\n\n")

*** BEST MODEL: ARMA(0,0)+GARCH(1,1) ***
Code
cat("Model Coefficients:\n")
Model Coefficients:
Code
print(coef(best_garch_czr))
         mu       omega      alpha1       beta1 
-0.01029034  0.66993148  0.11243921  0.84204206 

Model Selection Rationale: The AIC-selected model balances fit and parsimony. CZR exhibits volatility patterns driven by both established gaming operations and newer sports betting ventures.

Code
std_resid_czr <- residuals(best_garch_czr, standardize = TRUE)

par(mfrow = c(3, 2))
plot(std_resid_czr, type = "l", main = "Standardized Residuals (CZR)", ylab = "Std. Residuals")
abline(h = 0, col = "red", lty = 2)
abline(h = c(-2, 2), col = "blue", lty = 2)

plot(std_resid_czr^2, type = "l", main = "Squared Std. Residuals (CZR)", ylab = "Residuals²")

acf(std_resid_czr, lag.max = 20, main = "ACF: Std. Residuals")
acf(std_resid_czr^2, lag.max = 20, main = "ACF: Squared Std. Residuals")

qqnorm(std_resid_czr, main = "Q-Q Plot (CZR)")
qqline(std_resid_czr, col = "red")

hist(std_resid_czr, breaks = 50, probability = TRUE, main = "Distribution (CZR)", xlab = "Std. Residuals")
curve(dnorm(x), add = TRUE, col = "red", lwd = 2)

Code
par(mfrow = c(1, 1))

# Ljung-Box tests
lb_czr <- Box.test(std_resid_czr, lag = 10, type = "Ljung-Box")
lb_czr_sq <- Box.test(std_resid_czr^2, lag = 10, type = "Ljung-Box")

cat("\n=== Ljung-Box Tests (CZR) ===\n\n")

=== Ljung-Box Tests (CZR) ===
Code
cat("Standardized Residuals: p =", round(lb_czr$p.value, 4), "\n")
Standardized Residuals: p = 0.5736 
Code
cat("  Conclusion:", ifelse(lb_czr$p.value > 0.05,
    "No autocorrelation (good)",
    "Autocorrelation present (concern)"
), "\n\n")
  Conclusion: No autocorrelation (good) 
Code
cat("Squared Std. Residuals: p =", round(lb_czr_sq$p.value, 4), "\n")
Squared Std. Residuals: p = 0.7876 
Code
cat("  Conclusion:", ifelse(lb_czr_sq$p.value > 0.05,
    "ARCH effects captured (good)",
    "Remaining ARCH effects"
), "\n")
  Conclusion: ARCH effects captured (good) 

Diagnostic Summary: - Standardized residuals should resemble white noise (no pattern in ACF) - Squared residuals ACF tests whether GARCH captured all volatility clustering - Ljung-Box p > 0.05 indicates successful model specification - Q-Q plot reveals any remaining fat-tail characteristics

Code
cond_vol_czr <- best_garch_czr@sigma.t

vol_df_czr <- data.frame(
    Date = czr$Date,
    Return_Pct = czr$Return_Pct,
    Cond_Vol = as.numeric(cond_vol_czr)
)

ggplot(vol_df_czr, aes(x = Date)) +
    geom_line(aes(y = Return_Pct), color = "black", alpha = 0.6, linewidth = 0.4) +
    geom_ribbon(aes(ymin = -2 * Cond_Vol, ymax = 2 * Cond_Vol),
        fill = "darkgreen", alpha = 0.2
    ) +
    geom_line(aes(y = 2 * Cond_Vol), color = "darkgreen", linetype = "dashed", linewidth = 0.6) +
    geom_line(aes(y = -2 * Cond_Vol), color = "darkgreen", linetype = "dashed", linewidth = 0.6) +
    geom_hline(yintercept = 0, linetype = "solid", color = "black", linewidth = 0.5) +
    labs(
        title = "CZR: Returns with GARCH Conditional Volatility (±2σ bands)",
        subtitle = "Green bands show time-varying volatility forecast",
        x = "Date",
        y = "Return (%)"
    ) +
    theme_minimal()

Code
ggplot(vol_df_czr, aes(x = Date, y = Cond_Vol)) +
    geom_line(color = "darkgreen", linewidth = 0.8) +
    labs(
        title = "CZR: GARCH Conditional Volatility Over Time",
        subtitle = "Volatility reflects both casino operations and sports betting integration",
        x = "Date",
        y = "Conditional Volatility (%)"
    ) +
    theme_minimal()

Code
cat("\nCurrent Volatility (last observation):", round(tail(cond_vol_czr, 1), 4), "%\n")

Current Volatility (last observation): 3.3838 %
Code
cat("Average Volatility:", round(mean(cond_vol_czr), 4), "%\n")
Average Volatility: 3.8232 %
Code
cat("Max Volatility:", round(max(cond_vol_czr), 4), "%\n")
Max Volatility: 22.069 %

Volatility Interpretation: CZR’s conditional volatility shows periods of elevated risk during major corporate events. The GARCH model successfully tracks these time-varying patterns, providing valuable risk forecasts for portfolio managers and options traders.

Code
mgm_returns <- mgm$Return_Pct

# Fit candidate mean models
mean_models_mgm <- list(
    ARMA00 = arima(mgm_returns, order = c(0, 0, 0)),
    ARMA10 = arima(mgm_returns, order = c(1, 0, 0)),
    ARMA01 = arima(mgm_returns, order = c(0, 0, 1)),
    ARMA11 = arima(mgm_returns, order = c(1, 0, 1))
)

mean_aic_mgm <- data.frame(
    Model = names(mean_models_mgm),
    AIC = sapply(mean_models_mgm, AIC),
    BIC = sapply(mean_models_mgm, BIC)
)

cat("=== MGM Mean Equation Comparison ===\n\n")
=== MGM Mean Equation Comparison ===
Code
kable(mean_aic_mgm,
    format = "html",
    digits = 2,
    caption = "Model Comparison: Mean Equation Only (MGM)"
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    ) %>%
    row_spec(which.min(mean_aic_mgm$AIC), bold = TRUE, color = "white", background = "#006bb6")
Model Comparison: Mean Equation Only (MGM)
Model AIC BIC
ARMA00 ARMA00 7815.75 7826.35
ARMA10 ARMA10 7815.98 7831.87
ARMA01 ARMA01 7816.29 7832.19
ARMA11 ARMA11 7814.28 7835.47
Code
# ARCH test
best_mean_mgm <- mean_models_mgm[[which.min(mean_aic_mgm$AIC)]]
resid_mgm <- residuals(best_mean_mgm)

cat("\n")
Code
# Plot squared residuals
par(mfrow = c(1, 2))
acf(resid_mgm^2, lag.max = 30, main = "ACF: Squared Residuals (MGM)")
pacf(resid_mgm^2, lag.max = 30, main = "PACF: Squared Residuals (MGM)")

Code
par(mfrow = c(1, 1))

arch_test_mgm <- FinTS::ArchTest(resid_mgm, lags = 12)
cat("\n=== ARCH LM Test (MGM) ===\n")

=== ARCH LM Test (MGM) ===
Code
cat("Chi-squared:", round(arch_test_mgm$statistic, 4), "\n")
Chi-squared: 467.1476 
Code
cat("p-value:", format(arch_test_mgm$p.value, scientific = TRUE, digits = 4), "\n")
p-value: 2.15e-92 
Code
cat("Conclusion:", ifelse(arch_test_mgm$p.value < 0.05,
    "ARCH effects present → GARCH needed",
    "No ARCH effects"
), "\n")
Conclusion: ARCH effects present → GARCH needed 

Interpretation: Despite MGM’s diversified business model, the stock still exhibits ARCH effects, though potentially milder than pure-play sports betting stocks. The ARCH LM test confirms time-varying volatility requiring GARCH specification.

Code
garch_models_mgm <- list()
garch_results_mgm <- data.frame()

for (spec in model_specs) {
    # Fit model using fGarch
    fit <- tryCatch(
        {
            garchFit(formula = spec$formula, data = mgm_returns, trace = FALSE)
        },
        error = function(e) {
            cat("Warning: Failed to fit", spec$name, "for MGM\n")
            NULL
        }
    )

    if (!is.null(fit)) {
        garch_models_mgm[[spec$name]] <- fit

        # Extract information criteria
        aic_val <- fit@fit$ics["AIC"]
        bic_val <- fit@fit$ics["BIC"]
        loglik_val <- fit@fit$llh

        garch_results_mgm <- rbind(garch_results_mgm, data.frame(
            Model = spec$name,
            AIC = aic_val,
            BIC = bic_val,
            Log_Likelihood = loglik_val
        ))
    }
}

cat("=== MGM GARCH Model Comparison ===\n\n")
=== MGM GARCH Model Comparison ===
Model Comparison: ARMA+GARCH(1,1) Models for MGM
Model AIC BIC Log_Likelihood
ARMA(0,0)+GARCH(1,1) 4.9152 4.9296 3625.892
ARMA(1,0)+GARCH(1,1) 4.9165 4.9344 3625.799
ARMA(0,1)+GARCH(1,1) 4.9165 4.9344 3625.800
ARMA(1,1)+GARCH(1,1) 4.9142 4.9357 3623.114
Code
best_garch_mgm <- garch_models_mgm[[which.min(garch_results_mgm$AIC)]]
best_name_mgm <- garch_results_mgm$Model[which.min(garch_results_mgm$AIC)]

cat("\n*** BEST MODEL:", best_name_mgm, "***\n\n")

*** BEST MODEL: ARMA(1,1)+GARCH(1,1) ***
Code
cat("Model Coefficients:\n")
Model Coefficients:
Code
print(coef(best_garch_mgm))
          mu          ar1          ma1        omega       alpha1        beta1 
 0.002507931  0.951473676 -0.969389579  0.533056417  0.083842660  0.854398683 

Model Selection Rationale: The selected GARCH(1,1) specification for MGM captures volatility persistence while maintaining parsimony. MGM’s diversified operations (Las Vegas Strip properties, regional casinos, BetMGM, entertainment venues) create a more stable volatility profile compared to pure-play operators, which should be reflected in the model coefficients.

Code
std_resid_mgm <- residuals(best_garch_mgm, standardize = TRUE)

par(mfrow = c(3, 2))
plot(std_resid_mgm, type = "l", main = "Standardized Residuals (MGM)", ylab = "Std. Residuals")
abline(h = 0, col = "red", lty = 2)
abline(h = c(-2, 2), col = "blue", lty = 2)

plot(std_resid_mgm^2, type = "l", main = "Squared Std. Residuals (MGM)", ylab = "Residuals²")

acf(std_resid_mgm, lag.max = 20, main = "ACF: Std. Residuals")
acf(std_resid_mgm^2, lag.max = 20, main = "ACF: Squared Std. Residuals")

qqnorm(std_resid_mgm, main = "Q-Q Plot (MGM)")
qqline(std_resid_mgm, col = "red")

hist(std_resid_mgm, breaks = 50, probability = TRUE, main = "Distribution (MGM)", xlab = "Std. Residuals")
curve(dnorm(x), add = TRUE, col = "red", lwd = 2)

Code
par(mfrow = c(1, 1))

# Ljung-Box tests
lb_mgm <- Box.test(std_resid_mgm, lag = 10, type = "Ljung-Box")
lb_mgm_sq <- Box.test(std_resid_mgm^2, lag = 10, type = "Ljung-Box")

cat("\n=== Ljung-Box Tests (MGM) ===\n\n")

=== Ljung-Box Tests (MGM) ===
Code
cat("Standardized Residuals: p =", round(lb_mgm$p.value, 4), "\n")
Standardized Residuals: p = 0.2804 
Code
cat("  Conclusion:", ifelse(lb_mgm$p.value > 0.05,
    "No autocorrelation (good)",
    "Autocorrelation present (concern)"
), "\n\n")
  Conclusion: No autocorrelation (good) 
Code
cat("Squared Std. Residuals: p =", round(lb_mgm_sq$p.value, 4), "\n")
Squared Std. Residuals: p = 0.922 
Code
cat("  Conclusion:", ifelse(lb_mgm_sq$p.value > 0.05,
    "ARCH effects captured (good)",
    "Remaining ARCH effects"
), "\n")
  Conclusion: ARCH effects captured (good) 

Diagnostic Summary: MGM’s standardized residuals should show cleaner behavior than PENN or DKNG, reflecting the stabilizing effect of diversified revenue streams. Successful diagnostics (Ljung-Box p > 0.05) indicate the GARCH model adequately captures MGM’s conditional volatility dynamics.

Code
cond_vol_mgm <- best_garch_mgm@sigma.t

vol_df_mgm <- data.frame(
    Date = mgm$Date,
    Return_Pct = mgm$Return_Pct,
    Cond_Vol = as.numeric(cond_vol_mgm)
)

ggplot(vol_df_mgm, aes(x = Date)) +
    geom_line(aes(y = Return_Pct), color = "black", alpha = 0.6, linewidth = 0.4) +
    geom_ribbon(aes(ymin = -2 * Cond_Vol, ymax = 2 * Cond_Vol),
        fill = "darkorange", alpha = 0.2
    ) +
    geom_line(aes(y = 2 * Cond_Vol), color = "darkorange", linetype = "dashed", linewidth = 0.6) +
    geom_line(aes(y = -2 * Cond_Vol), color = "darkorange", linetype = "dashed", linewidth = 0.6) +
    geom_hline(yintercept = 0, linetype = "solid", color = "black", linewidth = 0.5) +
    labs(
        title = "MGM: Returns with GARCH Conditional Volatility (±2σ bands)",
        subtitle = "Orange bands show time-varying volatility forecast",
        x = "Date",
        y = "Return (%)"
    ) +
    theme_minimal()

Code
ggplot(vol_df_mgm, aes(x = Date, y = Cond_Vol)) +
    geom_line(color = "darkorange", linewidth = 0.8) +
    labs(
        title = "MGM: GARCH Conditional Volatility Over Time",
        subtitle = "Most stable volatility profile among all sports betting stocks",
        x = "Date",
        y = "Conditional Volatility (%)"
    ) +
    theme_minimal()

Code
cat("\nCurrent Volatility (last observation):", round(tail(cond_vol_mgm, 1), 4), "%\n")

Current Volatility (last observation): 2.222 %
Code
cat("Average Volatility:", round(mean(cond_vol_mgm), 4), "%\n")
Average Volatility: 2.9554 %
Code
cat("Max Volatility:", round(max(cond_vol_mgm), 4), "%\n")
Max Volatility: 15.3038 %

Volatility Interpretation: MGM’s conditional volatility time series shows the lowest average volatility among all four stocks, confirming that diversification provides natural hedging against sector-specific shocks. The GARCH forecasts are particularly valuable for MGM during major events that create temporary volatility spikes.


Model Equations

Selected Model: ARMA(1,0)+GARCH(1,1)

Mean Equation

\[ r_t = 0.062689 + \epsilon_t \]

where \(\epsilon_t = \sigma_t z_t\) and \(z_t \sim N(0, 1)\)

Variance Equation (GARCH(1,1))

\[ \sigma_t^2 = 0.061626 + 0.027946 \epsilon_{t-1}^2 + 0.96814 \sigma_{t-1}^2 \]

Unconditional Variance:

\[ \sigma^2_{\text{unconditional}} = \frac{\omega}{1 - \alpha_1 - \beta_1} = \frac{0.061626}{1 - 0.9961} = 15.7459 \]

Unconditional volatility: 3.9681%

Selected Model: ARMA(0,0)+GARCH(1,1)

Mean Equation

\[ r_t = -0.034119 + \epsilon_t \]

Variance Equation (GARCH(1,1))

\[ \sigma_t^2 = 0.862831 + 0.091354 \epsilon_{t-1}^2 + 0.861479 \sigma_{t-1}^2 \]

Selected Model: ARMA(0,0)+GARCH(1,1)

Mean Equation

\[ r_t = -0.01029 + \epsilon_t \]

where \(\epsilon_t = \sigma_t z_t\) and \(z_t \sim N(0, 1)\)

Variance Equation (GARCH(1,1))

\[ \sigma_t^2 = 0.669931 + 0.112439 \epsilon_{t-1}^2 + 0.842042 \sigma_{t-1}^2 \]

Unconditional Variance:

\[ \sigma^2_{\text{unconditional}} = \frac{\omega}{1 - \alpha_1 - \beta_1} = \frac{0.669931}{1 - 0.9545} = 14.7177 \]

Unconditional volatility: 3.8364%

Selected Model: ARMA(1,1)+GARCH(1,1)

Mean Equation

\[ r_t = 0.002508 + \epsilon_t \]

where \(\epsilon_t = \sigma_t z_t\) and \(z_t \sim N(0, 1)\)

Variance Equation (GARCH(1,1))

\[ \sigma_t^2 = 0.533056 + 0.083843 \epsilon_{t-1}^2 + 0.854399 \sigma_{t-1}^2 \]

Unconditional Variance:

\[ \sigma^2_{\text{unconditional}} = \frac{\omega}{1 - \alpha_1 - \beta_1} = \frac{0.533056}{1 - 0.9382} = 8.6313 \]

Unconditional volatility: 2.9379% (lowest among all four stocks)

Business Model Impact: MGM’s lower volatility parameters directly reflect its diversified operations. The smaller \(\alpha_1\) suggests that MGM’s stock is less sensitive to short-term shocks, while moderate \(\beta_1\) indicates reasonable volatility persistence.


Cross-Stock Comparison

Now that we’ve analyzed individual stocks, let’s compare volatility dynamics across all four sports betting companies.

Code
# Combine all returns data
all_returns <- bind_rows(
    dkng %>% dplyr::select(Date, Return_Pct, Ticker),
    penn %>% dplyr::select(Date, Return_Pct, Ticker),
    czr %>% dplyr::select(Date, Return_Pct, Ticker),
    mgm %>% dplyr::select(Date, Return_Pct, Ticker)
)

# Plot all returns together
p_all_returns <- ggplot(all_returns, aes(x = Date, y = Return_Pct, color = Ticker)) +
    geom_line(alpha = 0.5, linewidth = 0.3) +
    facet_wrap(~Ticker, ncol = 1) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
    labs(
        title = "Daily Returns Comparison: All Sports Betting Stocks",
        subtitle = "Volatility clustering visible across all stocks",
        x = "Date",
        y = "Return (%)"
    ) +
    scale_color_manual(values = c(
        "DKNG" = "#006bb6", "PENN" = "darkred",
        "CZR" = "darkgreen", "MGM" = "darkorange"
    )) +
    theme_minimal() +
    theme(legend.position = "none")

print(p_all_returns)

Code
# Rolling volatility comparison
all_vol <- bind_rows(
    dkng %>% dplyr::select(Date, Roll_Vol_20, Ticker),
    penn %>% dplyr::select(Date, Roll_Vol_20, Ticker),
    czr %>% dplyr::select(Date, Roll_Vol_20, Ticker),
    mgm %>% dplyr::select(Date, Roll_Vol_20, Ticker)
) %>% filter(!is.na(Roll_Vol_20))

p_vol_comp <- ggplot(all_vol, aes(x = Date, y = Roll_Vol_20, color = Ticker)) +
    geom_line(linewidth = 0.8, alpha = 0.8) +
    labs(
        title = "20-Day Rolling Volatility Comparison",
        subtitle = "PENN shows highest volatility, MGM most stable",
        x = "Date",
        y = "Volatility (% std dev)",
        color = "Stock"
    ) +
    scale_color_manual(values = c(
        "DKNG" = "#006bb6", "PENN" = "darkred",
        "CZR" = "darkgreen", "MGM" = "darkorange"
    )) +
    theme_minimal() +
    theme(legend.position = "right")

print(p_vol_comp)

Code
# Save for conclusion
ggsave("images/betting_stocks_volatility.png", plot = p_vol_comp, width = 14, height = 8, dpi = 300, bg = "white")

Key Observations:

  • PENN exhibits the highest volatility throughout the period, reflecting business model uncertainty
  • MGM shows the most stable returns, benefiting from diversified resort operations
  • DKNG and CZR display moderate volatility, typical of focused sports betting operators
  • All stocks show synchronized volatility spikes during market-wide events (COVID, regulatory news)
Comparative Summary Statistics: Sports Betting Stocks
Stock Mean Return (%) Volatility (%) Min Return (%) Max Return (%) Skewness Kurtosis
DKNG 0.0638 4.2354 -32.6061 15.9306 -0.3807 7.8417
PENN -0.0220 4.9443 -59.4142 29.8592 -1.7046 30.3981
CZR -0.0718 4.5517 -47.0084 36.5733 -0.9193 21.3669
MGM -0.0042 3.4071 -40.9684 28.6042 -0.9679 26.9113
Code
# Density plots
ggplot(all_returns, aes(x = Return_Pct, fill = Ticker)) +
    geom_density(alpha = 0.5) +
    facet_wrap(~Ticker, scales = "free_y") +
    labs(
        title = "Return Distribution Comparison",
        subtitle = "All stocks show fat tails (leptokurtic distributions)",
        x = "Return (%)",
        y = "Density"
    ) +
    scale_fill_manual(values = c(
        "DKNG" = "#006bb6", "PENN" = "darkred",
        "CZR" = "darkgreen", "MGM" = "darkorange"
    )) +
    geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
    theme_minimal() +
    theme(legend.position = "none")

Code
# Box plots
ggplot(all_returns, aes(x = Ticker, y = Return_Pct, fill = Ticker)) +
    geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
    labs(
        title = "Return Distribution Box Plots",
        subtitle = "PENN shows widest range and most extreme outliers",
        x = "Stock",
        y = "Return (%)"
    ) +
    scale_fill_manual(values = c(
        "DKNG" = "#006bb6", "PENN" = "darkred",
        "CZR" = "darkgreen", "MGM" = "darkorange"
    )) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    theme_minimal() +
    theme(legend.position = "none")

Code
# Create wide format for correlation
returns_wide <- all_returns %>%
    dplyr::select(Date, Ticker, Return_Pct) %>%
    pivot_wider(names_from = Ticker, values_from = Return_Pct) %>%
    dplyr::select(-Date) %>%
    na.omit()

# Correlation matrix
cor_matrix <- cor(returns_wide)

# Display formatted correlation matrix table
cat("### Correlation Matrix\n\n")

Correlation Matrix

Code
kable(round(cor_matrix, 3),
    format = "html",
    digits = 3,
    caption = "Return Correlations Among Sports Betting Stocks"
) %>%
    kable_styling(
        full_width = FALSE,
        bootstrap_options = c("striped", "hover", "condensed", "responsive")
    )
Return Correlations Among Sports Betting Stocks
DKNG PENN CZR MGM
DKNG 1.000 0.499 0.520 0.379
PENN 0.499 1.000 0.696 0.597
CZR 0.520 0.696 1.000 0.739
MGM 0.379 0.597 0.739 1.000
Code
# Heatmap
library(reshape2)
cor_melted <- melt(cor_matrix)

ggplot(cor_melted, aes(Var1, Var2, fill = value)) +
    geom_tile(color = "white") +
    geom_text(aes(label = round(value, 2)), color = "black", size = 5, fontface = "bold") +
    scale_fill_gradient2(
        low = "#d1e5f0", high = "#006bb6", mid = "white",
        midpoint = 0, limit = c(-1, 1), space = "Lab",
        name = "Correlation"
    ) +
    labs(
        title = "Return Correlation Heatmap",
        subtitle = "Sports betting stocks show moderate to high positive correlation",
        x = NULL, y = NULL
    ) +
    theme_minimal() +
    theme(
        axis.text.x = element_text(angle = 45, hjust = 1, size = 11),
        axis.text.y = element_text(size = 11),
        plot.title = element_text(face = "bold")
    )

Correlation Insights:

  • All stocks show positive correlation (0.3-0.7), indicating they respond to common industry factors
  • CZR-MGM show highest correlation (~0.6-0.7): both traditional casino operators
  • DKNG-PENN moderate correlation (~0.4-0.5): pure-play sports betting focus
  • Correlations increase during crisis periods (systematic risk)
Stock Business Model Volatility Profile Key Risks
DKNG Pure-play online sports betting & DFS Moderate-High Regulatory, competition, customer acquisition costs
PENN Regional casinos + ESPN BET partnership Highest Business model transition, brand partnerships
CZR Casino operator + Caesars Sportsbook Moderate Debt levels, integration risks, competition
MGM Resort/casino + BetMGM Lowest Diversified operations reduce volatility

Bridging to Deep Learning for Time Series

The financial models above demonstrate that sports betting stocks exhibit classic volatility clustering patterns captured by ARCH/GARCH frameworks. DKNG, PENN, CZR, and MGM show time-varying variance where turbulent periods persist before giving way to calm. These models excel at forecasting conditional volatility—predicting the uncertainty around returns rather than the returns themselves—making them essential for risk management and options pricing.

Yet ARCH/GARCH models share a fundamental assumption with ARIMA and VAR: they impose explicit parametric structure. GARCH(1,1) assumes today’s variance depends on yesterday’s variance and yesterday’s squared return with fixed coefficients. ARIMA(p,d,q) requires us to specify lag orders before fitting. These models demand domain expertise and statistical judgment, making them powerful when assumptions hold but brittle when reality deviates.

The deep learning alternative abandons explicit parameterization in favor of learned representations. Recurrent neural networks (RNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRUs) automatically discover temporal patterns through gradient descent rather than requiring analysts to pre-specify AR and MA terms. This flexibility enables deep learning models to capture complex nonlinearities, long-range dependencies, and regime shifts that defy simple parametric forms.

The next chapter tests whether this flexibility improves forecasts or simply overfits limited data. With only 45 years of NBA data, do LSTM’s 1,600+ parameters extract genuine signal or memorize noise? The comparison is direct: identical train/test splits, identical evaluation metrics (RMSE, MAE), identical target variable (ORtg). Traditional methods have interpretable coefficients and well-understood statistical properties; deep learning offers representational power but requires careful regularization.

What you’ll see next:

  • RNN, LSTM, and GRU architectures applied to NBA offensive efficiency forecasting, with explicit comparison to ARIMA baselines
  • Regularization techniques (L2 penalties, dropout, early stopping) that prevent overfitting on small datasets
  • Input windowing that transforms annual observations into supervised learning sequences (5-year windows → 1-year ahead)
  • Performance comparison tables showing whether deep learning adds value beyond traditional methods for sports analytics

The transition from financial to deep learning models reflects a methodological tension in modern forecasting: parametric parsimony versus representational flexibility. GARCH models have 2-3 parameters; LSTMs have thousands. For volatile betting stocks with rich intraday data, that complexity may capture genuine structure. For annual NBA statistics with 45 observations, it may simply overfit.

The critical question: Does basketball’s evolution exhibit patterns too complex for ARIMA’s linear assumptions but too simple for LSTM’s deep architecture? Or do we sit in the sweet spot where both approaches offer complementary insights? The empirical comparison tests whether sports analytics has entered the deep learning era or whether classical time series methods remain the gold standard for data-scarce domains.

Where GARCH asked “How does volatility evolve?”, deep learning asks “What temporal patterns can we learn without assuming functional form?” Both frameworks forecast the future, but from opposite epistemological starting points: theory-driven specification versus data-driven discovery.